Reinforcement Learning for Games: Failures and Successes CMA-ES and TDL in comparision
نویسندگان
چکیده
We apply CMA-ES, an evolution strategy with covariance matrix adaptation, and TDL (Temporal Difference Learning) to reinforcement learning tasks. In both cases these algorithms seek to optimize a neural network which provides the policy for playing a simple game (TicTacToe). Our contribution is to study the effect of varying learning conditions on learning speed and quality. Certain initial failures with wrong fitness functions lead to the development of new fitness functions, which allow fast learning. These new fitness functions in combination with CMA-ES reduce the number of required games needed for training to the same order of magnitude as TDL. The selection of suitable features is also of critical importance for the learning success. It could be shown that using the raw board position as an input feature is not very effective – and it is orders of magnitudes slower than different feature sets which exploit the symmetry of the game. We develop a measure “feature set utility”, FU , which allows to characterize a given feature set in advance. We show that the lower bound provided by FU is largely in accordance with the results from our repeated experiments for very different learning algorithms, CMA-ES and TDL.
منابع مشابه
A Covariance Matrix Adaptation Evolution Strategy for Direct Policy Search in Reproducing Kernel Hilbert Space
The covariance matrix adaptation evolution strategy (CMA-ES) is an efficient derivativefree optimization algorithm. It optimizes a black-box objective function over a well defined parameter space. In some problems, such parameter spaces are defined using function approximation in which feature functions are manually defined. Therefore, the performance of those techniques strongly depends on the...
متن کاملUncertainty Handling in Evolutionary Direct Policy Search
Uncertainty arises in reinforcement learning from various sources. Therefore it is necessary to consider statistics based on several roll-outs for evaluating behavioral policies. An adaptive uncertainty handling is added to the CMA-ES, a variable metric evolution strategy proposed for direct policy search. The uncertainty handling dynamically adjusts the number of episodes considered in each ev...
متن کاملEvolutionary reinforcement learning of artificial neural networks
In this article we describe EANT2, Evolutionary Acquisition of Neural Topologies, Version 2, a method that creates neural networks by evolutionary reinforcement learning. The structure of the networks is developed using mutation operators, starting from a minimal structure. Their parameters are optimised using CMA-ES, Covariance Matrix Adaptation Evolution Strategy, a derandomised variant of ev...
متن کاملReinforcement Learning in Repeated Interaction Games
We study long run implications of reinforcement learning when two players repeatedly interact with one another over multiple rounds to play a finite action game. Within each round, the players play the game many successive times with a fixed set of aspirations used to evaluate payoff experiences as successes or failures. The probability weight on successful actions is increased, while failures ...
متن کاملSelf-Organisation of Neural Topologies by Evolutionary Reinforcement Learning
In this article we present EANT, “Evolutionary Acquisition of Neural Topologies”, a method that creates neural networks (NNs) by evolutionary reinforcement learning. The structure of NNs is developed using mutation operators, starting from a minimal structure. Their parameters are optimised using CMA-ES. EANT can create NNs that are very specialised; they achieve a very good performance while b...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009